In-class Ex 3

Author

Stephen Tay

Published

September 9, 2024

Modified

September 10, 2024

Overview

In this in-class exercise, we will highlight common mistakes and blind spots encountered when working with geospatial data or utilizing R’s geospatial packages.

pacman::p_load(sf, spNetwork, tmap, tidyverse)

1. Importing & Transforming Data

First, let’s import two geospatial datasets that we will be working with:

  • Punggol_St: A line feature dataset representing the road network within the Punggol Planning Area.
  • Punggol_CC: A point feature dataset representing the locations of childcare centers within the Punggol Planning Area.

The road network is provided in ESRI shapefile format. We use st_read() to load the file, without the need to specify the shapefile extension.

network <- st_read(dsn="data/geospatial", 
                   layer="Punggol_St")
Reading layer `Punggol_St' from data source 
  `/Users/stephentay/stephentay/ISSS626-Geospatial-Analytics/In-class_Ex/In-class_Ex03/data/geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 2642 features and 2 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 34038.56 ymin: 40941.11 xmax: 38882.85 ymax: 44801.27
Projected CRS: SVY21 / Singapore TM

Using st_read() to load the childcare shapefile returned a message indicating that the coordinate dimension is XYZ, meaning it is in 3D. However, for our analysis with the spNetwork package, the coordinates must be 2-dimensional; otherwise, the analysis will not work.

childcare <- st_read(dsn="data/geospatial", layer="Punggol_CC")
Reading layer `Punggol_CC' from data source 
  `/Users/stephentay/stephentay/ISSS626-Geospatial-Analytics/In-class_Ex/In-class_Ex03/data/geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 61 features and 1 field
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 34423.98 ymin: 41503.6 xmax: 37619.47 ymax: 44685.77
z_range:       zmin: 0 zmax: 0
Projected CRS: SVY21 / Singapore TM

The 3D point coordinates can be converted to 2-dimensional by using the st_zm() function, which removes the z-coordinates.

childcare <- st_read(dsn="data/geospatial", layer="Punggol_CC") %>%
  st_zm(drop = TRUE, what = "ZM")
Reading layer `Punggol_CC' from data source 
  `/Users/stephentay/stephentay/ISSS626-Geospatial-Analytics/In-class_Ex/In-class_Ex03/data/geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 61 features and 1 field
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 34423.98 ymin: 41503.6 xmax: 37619.47 ymax: 44685.77
z_range:       zmin: 0 zmax: 0
Projected CRS: SVY21 / Singapore TM
childcare
Simple feature collection with 61 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 34423.98 ymin: 41503.6 xmax: 37619.47 ymax: 44685.77
Projected CRS: SVY21 / Singapore TM
First 10 features:
      Name                  geometry
1   kml_10 POINT (36173.81 42550.33)
2   kml_99 POINT (36479.56 42405.21)
3  kml_100 POINT (36618.72 41989.13)
4  kml_101 POINT (36285.37 42261.42)
5  kml_122  POINT (35414.54 42625.1)
6  kml_161 POINT (36545.16 42580.09)
7  kml_172 POINT (35289.44 44083.57)
8  kml_188 POINT (36520.56 42844.74)
9  kml_205  POINT (36924.01 41503.6)
10 kml_222 POINT (37141.76 42326.36)

2. GeoVisualisation

Always plot the road network first, then the points. Use add = TRUE to overlay the points on the road network.

plot(st_geometry(network))
plot(childcare, add=TRUE, col='red', pch = 19)

If you use plot(network) instead of plot(st_geometry(network)), it will attempt to plot every column in the network sf dataset, which is not useful. Therefore, the correct approach is to use plot(st_geometry(network)).

If the color for the point sf dataset has been specified (e.g. ‘red’), there’s no need to use plot(st_geometry(childcare)), as the color is already applied to each column in the childcare sf dataset.

plot(network)
plot(childcare, add=TRUE, col='red', pch = 19)

  • tmap_mode('plot') generates a static map.
  • For each layer, you need to specify the map layer using tm_shape().
  • Use tm_symbols() or tm_markers() when customizing point images with PNG files.
  • For standard point displays, use tm_square(), tm_bubbles(), or tm_dots().
tmap_mode('plot')
tm_shape(childcare) + 
  tm_dots(col = 'red') + 
  tm_shape(network) +
  tm_lines()

tmap_mode('view') generates an interactive map. Remember to switch back to static mode by ending with tmap_mode('plot').

tmap_mode('view')
tm_shape(childcare) + 
  tm_dots() + 
  tm_shape(network) +
  tm_lines()
tmap_mode('plot')

3. Network Kernel Density Estimation (NKDE) Analysis

A lixel length of 700m was chosen based on research regarding walkable distances in the Singaporean environment. The minimum distance was set at 350m, which is arbitrarily half of the 700m.

The road network contains 2,642 line geometries. With a lixel length of 700m and a minimum distance of 350m, 2,645 lixels are generated.

lixels <- lixelize_lines(network, 
                         lx_length = 700, 
                         mindist = 350)
nrow(lixels)
[1] 2645

When the minimum distance is reduced to 50m, 2,648 lixels are generated.

lixels <- lixelize_lines(network, 
                         lx_length = 700, 
                         mindist = 50)
nrow(lixels)
[1] 2648

With a lixel length of 1000m and a minimum distance of 150m, 2,645 lixels are generated

lixels <- lixelize_lines(network, 
                         lx_length = 1000, 
                         mindist = 150)
nrow(lixels)
[1] 2645
Important

The lixel length should be meaningful and reflective of real-world phenomena (e.g., 700m for walkable distance). It’s also important that the lixel length is not shorter than the distance between data points. Best practice: Use the nearest neighbor method and test the 10th, 20th, or 25th percentile of the distance between points to determine the optimal lixel length.

You may plot your lixels using the code chunk below:

samples <- lines_center(lixels) 

tmap_mode('view')
tm_shape(lixels) + 
  tm_lines() + 
  tm_shape(samples) +
  tm_dots(size = 0.01)
tmap_mode('plot')

There are three methods for computing NKDE: method = "simple" for the simple method, method = "discontinuous" for the discontinuous method, and method = "continuous" for the continuous method.

The continuous method works best for grid networks, but it is very time-consuming. For networks consisting of parallel straight highways, any of the methods will yield similar results.

In the code chunk below, the simple method is used.

simple_densities <- nkde(lines = network, events = childcare,
                  w = rep(1, nrow(childcare)), samples = samples,
                  kernel_name = "quartic", bw = 300, 
                  div= "bw", method = "simple", 
                  digits = 1, tol = 1,
                  grid_shape = c(1,1), max_depth = 8,
                  agg = 5, sparse = TRUE,
                  verbose = FALSE)

The densities represent the number of childcare centers per meter, but it is more meaningful to express them as centers per kilometer. Since the network consists of line geometries, you can convert the density by multiplying by 1,000. (Note: for planar KDE, where densities are measured per square meter, the conversion would require multiplying by 1,000 * 1,000).

samples$simple_density <- simple_densities*1000
lixels$simple_density <- simple_densities*1000

tmap_mode('view')
tm_shape(lixels) +
  tm_lines(col="simple_density") +
  tm_shape(childcare) +
  tm_dots()
tmap_mode('plot')

4. Network-constrained G- and K-function analysis

A test for complete spatial randomness (CSR) can be conducted using network-constrained G- and K-functions, both of which can be implemented with spNetwork’s kfunctions().

Note that the simulation index starts at 0, so to perform 1,000 Monte Carlo simulations, you should set nsim to 999.

set.seed(2024)
kfun_childcare <- kfunctions(network, 
                             childcare,
                             start = 0, 
                             end = 1000, 
                             step = 50, 
                             width = 50, 
                             nsim = 999, 
                             resolution = 50,
                             verbose = FALSE, 
                             conf_int = 0.05)

The output of kfunctions() includes both the K-function and G-function plots. Use the code below to generate the K-function plot.

Here’s how to interpret the plot: The blue line represents the empirical network K-function of the childcare centers in Punggol, while the gray area shows the results of 1,000 simulations within the 2.5% - 97.5% confidence interval. Since the blue line falls below the lower boundary of the envelope between the 200-400m distance range, it indicates a regular pattern of childcare center locations along the Punggol road network. The childcare centers are regularly spaced between 200-400m.

kfun_childcare$plotk

The output of kfunctions() includes both the K-function and G-function plots. Use the code below to generate the G-function plot.

kfun_childcare$plotg